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1.  SUMMARY  OF  RESEARCH  PROGRESS  AND  RESULTS 

During  the  five  years  supported  by  this  grant,  we  have  made  significant  progress 
both  in  areas  we  proposed  to  investigate  and  in  related  areas.  In  this  section,  we 
summarize  the  progress  in  those  areas  that  have  resulted  in  publications. 

1.1.  Stochastic  Control. 

1.1.1.  Stochastic  Control  of  Markov  Processes. 

We  have  begun  a  research  program  in  a  major  new  area  involving  adaptive  esti¬ 
mation  and  control  problems  for  stochastic  systems  involving  either  incomplete  (or 
noisy)  observations  of  the  state  or  nonlinear  dynamics.  The  first  class  of  problems 
we  have  studied  involves  finite  state  Markov  chains  with  incomplete  state  obser¬ 
vations  and  unknown  parameters;  in  particular,  we  have  studied  certain  classes  of 
quality  control,  replacement,  and  repair  problems.  However,  we  found  that  work 
remained  to  be  done  for  such  problems  with  known  parameters;  this  problem  was 
studied  in  [23]  and  [44].  In  these  papers,  we  consider  partially  observable  Markov 
decision  processes  with  finite  or  countable  (core)  state  and  observation  spaces  and 
finite  action  set.  Following  a  standard  approach,  an  equivalent  completely  observed 
problem  is  formulated,  with  the  same  finite  action  set  but  with  an  uncountable  state 
space,  namely  the  space  of  probability  distributions  on  the  original  core  state  space. 
It  is  observed  that  some  characteristics  induced  in  the  original  problem  due  to  the 
finiteness  or  countability  of  the  spaces  involved  are  reflected  onto  the  equivalent 
problem.  Sufficient  conditions  are  then  derived  for  a  bounded  solution  to  the  aver¬ 
age  cost  optimality  equation  to  exist.  We  illustrate  these  results  in  the  context  of 
machine  replacement  problems.  By  utilizing  the  inherent  convexity  of  the  partially 
observed  problem,  structural  properties  for  average  cost  policies  are  obtained  for  a 
two  state  replacement  problem;  these  are  similar  to  results  available  for  discount 
optimal  policies.  In  particular,  we  show  that  the  optimal  policy  has  the  “control 
limit”  or  “bang-bang”  form.  The  set  of  assumptions  used  seems  to  be  significantly 
less  restrictive  than  others  currently  available.  In  [25],  necessary  conditions  are 
given  for  the  existence  of  a  bounded  solution  of  the  average  cost  optimality  equa¬ 
tion.  We  consider  in  [46]  average  cost  Markov  decision  processes  on  a  countable 
state  space  and  with  unbounded  costs.  Under  a  penalizing  condition  on  the  cost  for 
unstable  behavior,  we  establish  the  existence  of  a  stable  stationary  strategy  which 
is  strong  average  optimal. 
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As  a  prelude  to  studying  adaptive  control,  the  problem  of  characterizing  the 
effects  that  uncertainties  and/or  small  changes  in  the  parameters  of  a  model  can 
have  on  optimal  policies  is  considered  in  [26],  [43].  It  is  shown  that  changes  in  the 
optimal  policy  are  very  difficult  to  detect,  even  for  relatively  simple  models.  By 
showing  for  a  machine  replacement  problem  modeled  by  a  partially  observed,  finite 
state  Markov  decision  process,  that  the  infinite  horizon,  optimal  discounted  cost 
function  is  piecewise  linear,  we  have  derived  formulas  for  the  optimal  cost  and  the 
optimal  policy,  thus  providing  a  means  for  carrying  out  sensitivity  analyses.  This 
work  is  extended  in  [24]  to  several  other  classes  of  problems,  including  an  inspection 
problem  with  standby  units,  an  optimal  stopping  problem,  input  optimization  for 
infinite  horizon  programs,  and  Markov  decision  processes  with  lagged  information. 

We  have  studied  in  [27]  controlled  diffusion  processes  on  an  infinite  horizon  with 
three  non-standard  cost  criteria:  weighted  cost,  variance  sensitive  cost,  and  overtak¬ 
ing  optimality.  Under  a  stability  assumption  we  establish  the  existence  of  stationary 
Markov  controls  which  are  optimal  for  these  criteria  in  certain  classes.  Also,  under 
very  general  conditions  we  establish  the  existence  of  an  e-optimal  Markov  policy  for 
the  weighted  criterion. 


1.1.2  Stochastic  Adaptive  Estimation  and  Control. 

In  [19],  [29],  and  [38],  the  adaptive  estimation  of  the  state  of  a  finite  state  Markov 
chain  with  incomplete  state  observations  and  in  which  the  state  transition  proba¬ 
bilities  depend  on  unknown  parameters  is  studied.  A  new  adaptive  estimation 
algorithm  for  finite  state  Markov  processes  with  incomplete  observations  is  devel¬ 
oped.  This  algorithm  is  then  analyzed  via  the  Ordinary  Differential  Equation  (ODE) 
Method.  That  is,  it  is  shown  that  the  convergence  of  the  parameter  estimation  al¬ 
gorithm  can  be  analyzed  by  studying  an  “averaged”  ordinary  differential  equation. 
The  most  crucial  and  difficult  aspect  of  the  proof  is  that  of  showing  that,  for  each 
value  of  the  unknown  parameter,  an  augmented  Markov  process  has  a  unique  in¬ 
variant  measure.  New  techniques  for  the  analysis  of  the  ergodicity  of  time-varying . 
Markov  chains  are  utilized.  The  convergence  of  the  recursive  parameter  estimates 
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is  studied,  and  the  optimality  of  the  adaptive  state  estimator  is  proved. 

We  have  begun  to  apply  similar  methods  to  adaptive  stochastic  control  problems  ^0Q_ 
with  incomplete  observations.  We  have  first  considered  a  quality  control  problem  — 


in  which  a  system,  such  as  a  manufacturing  unit  or  computer  communications 


a 

□ 


network,  can  be  in  either  of  two  states:  good  or  bad.  Control  actions  available 
to  the  inspector/decision-maker  are: 

(a)  produce  without  inspection, 

(b)  produce  and  inspect;  or 

(c)  repair. 


Under  actions  (a)  and  (b)  the  system  is  subject  to  Markovian  deterioration,  while 
a  repair  puts  the  unit  in  the  good  state  by  the  next  decision  time.  Informative  data 
might  become  available  while  producing  without  inspection,  and  inspection  is  not 
always  perfect.  Hence  the  problem  is  modeled  as  a  partially  observed  Markov  deci- 
sion  process  (POMDP).  Furthermore,  we  assume  that  deterioration  of  the  system 
depends  on  an  unknown  parameter,  namely  the  probability  of  the  state  going  from 
the  good  to  the  bad  state  in  one  time  epoch  when  no  repair  is  done.  For  the  case  of 
known  parameters,  we  have  shown  (see  above)  that  there  is  an  optimal  policy  for  the 
infinite  horizon  average  cost  criterion  that  is  of  the  control  limit  (bang-bang)  type. 
The  adaptive  stochastic  control  problem  is,  however,  much  more  difficult  than  the 
adaptive  estimation  problem,  because  the  presence  of  feedback  causes  the  system 
transitions  to  depend  on  the  parameter  estimates  and  introduces  discontinuities. 

In  [45]  and  [47],  we  have  analyzed  algorithms  for  this  quality  control  problem 
in  which  the  parameter  estimates  are  updated  only  after  the  system  is  repaired; 
such  algorithms  are  analogous  to  those  in  which  estimates  in  queueing  systems  are 
updated  only  after  each  busy  cycle.  Since  the  system  is  returned  to  the  “good”  state 
after  repair,  one  obtains  a  perfect  observation  of  the  state  at  that  time,  and  our 
algorithm  uses  the  observation  at  the  next  time  to  estimate  the  parameter.  Hence, 
we  develop  parameter  estimation  techniques  based  on  the  information  available 
after  actions  that  reset  the  state  to  a  known  value.  At  these  times,  the  augmented 
state  process  “regenerates,”  its  future  evolution  becoming  independent  of  the  past. 
Using  the  ODE  method,  we  show  that  two  algorithms,  one  based  on  maximum 
■tot  likelihood  and  another  based  on  prediction  error,  converge  almost  surely  to  the  true 

Q  i  •  parameter  value.  In  addition,  we  modify  the  method  of  Shwartz  and  Makowski  to 

r, 

h'  prove  optimality  of  the  resulting  certainty  equivalent  adaptive  policy,  assuming  only 

the  existence  of  some  sequence  of  parameter  estimates  converging  almost  surely  to 
the  true  parameter  value.  Again,  the  discontinuities  and  partial  observations  in 
this  problem  preclude  the  direct  use  of  previously  existing  methods,  but  we  have 
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been  able  to  generalize  the  method  to  problems  such  as  this.  Also,  we  have  avoided 
the  very  strong  standard  assumption  that  the  parameter  estimates  converge  almost 
surely  to  the  true  parameter  value  under  any  stationary  policy.  In  [39]  and  [42],  we 
have  proved  some  initial  results  toward  the  more  difficult  analysis  of  such  adaptive 
control  problems,  but  in  which  the  parameter  estimates  are  updated  at  every  time; 
in  this  case,  the  regenerative  structure  used  in  [45],  [47]  is  not  present.  Also,  in  [5] 
our  recent  results  for  parameter-adaptive  Markov  decision  processes  (MDP’s)  are 
extended  to  partially  observable  MDP’s  depending  on  unknown  parameters.  These 
results  include  approximations  converging  uniformly  to  the  optimal  reward  function 
and  asymptotically  optimal  adaptive  policies. 

Another  aspect  of  our  research  on  adaptive  control  has  involved  systems  with 
unknown  disturbance  distribution.  In  [6],  we  consider  adaptive  control  of  stochas¬ 
tic  systems  in  which  the  disturbance  or  driving  process  is  a  sequence  of  indepen¬ 
dent  identically  distributed  random  variables  with  unknown  distribution  and  a  dis¬ 
counted  reward  criterion  is  used.  Three  different  adaptive  policies  are  shown  to 
be  asymptotically  optimal,  and  for  each  of  them  we  obtain  uniform  approx' na¬ 
tions  of  the  optimal  reward  function.  We  have  also  obtained  preliminary  results 
on  the  extension  of  these  results  to  the  situation  in  which  only  incomplete  or  noisy 
observations  of  the  state  are  available.  In  addition,  we  have  in  [17]  extended  the 
nonparametric  results  of  [6]  to  problems  with  incomplete  state  observations.  Our 
approach  combines  convergence  results  for  empirical  processes  and  recent  results 
on  parameter-adaptive  stochastic  control  problems.  The  important  issue  of  im¬ 
plementation  has  been  addressed  in  [18],  which  presents  finite-state  discretization 
procedures  for  discrete-time,  infinite  horizon,  adaptive  Markov  control  processes 
which  depend  on  unknown  parameters.  The  discretizations  are  combined  with  a 
consistent  parameter  estimation  scheme  to  obtain  uniform  approximations  to  the 
optimal  value  function  and  asymptotically  optimal  adaptive  control  policies. 

We  have  investigated  the  adaptive  control  of  stochastic  bilinear  systems  in  [2] 
and  [34].  The  minimum  variance  control  law  for  bilinear  systems  with  known  pa¬ 
rameters  is  shown  to  yield  in  most  cases  controls  with  infinite  variance;  this  calls 
into  question  the  use  of  the  so-called  bilinear  self-tuning  regulators.  An  adaptive 
weighted  minimum  variance  controller  based  upon  the  cost  with  weighted  control 
effort  is  suggested  for  first  order  bilinear  systems  and  is  shown  to  yield  bounded- 
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ness  of  the  closed  loop  system  variables  under  a  certain  condition  on  the  parameter 
estimate. 

1.2.  Nonlinear  Systems. 

In  order  to  deepen  our  insight  into  nonlinear  systems,  we  have  also  investigated 
and  solved  a  number  of  problems  in  the  linearization  of  discrete-time  and  discretized 
nonlinear  systems.  In  [1]  and  [30],  necessary  and  sufficient  conditions  for  approxi¬ 
mate  linearizability  are  given.  We  also  give  a  sufficient  condition  for  local  lineariz- 
ability.  Finally,  we  present  analogous  results  for  multi-input  nonlinear  discrete-time 
systems.  In  [3],  necessary  and  sufficient  conditions  for  local  input-output  lineariz¬ 
ability  are  given.  We  show  that  these  conditions  are  also  sufficient  for  a  formal 
solution  to  the  global  input-output  linearization  problem.  Finally,  we  show  that  ze¬ 
ros  at  infinity  of  the  system  can  be  obtained  by  a  particular  structure  algorithm  for 
locally  input-output  linearizable  systems.  Whereas  the  objective  of  input -output 
linearizability  is  to  make  the  input- dependent  part  of  the  output  sequence  linear 
in  the  new  input,  that  of  immersion  by  nonsingular  feedback  into  a  linear  system 
(solved  in  [8],  [35])  is  to  make  the  output  sequence  jointly  linear  in  the  new  input 
and  some  analytic  function  of  the  initial  state.  Necessary  and  sufficient  conditions 
for  such  immersion  are  given. 

In  [4],  [31],  we  characterize  the  equivalence  of  single-input  single-output  discrete- 
time  nonlinear  systems  to  linear  ones,  via  a  state-coordinate  change  and  with  or 
without  feedback.  Four  cases  are  distinguished  by  allowing  or  disallowing  feedback 
as  well  as  by  including  the  output  map  or  not;  the  interdependence  of  these  problems 
is  analyzed.  An  important  feature  that  distinguishes  these  discrete-time  problems 
from  the  corresponding  problem  in  continuous-time  is  that  the  state-coordinate 
transformation  is  here  directly  computable  as  a  higher  composition  of  the  system 
and  output  maps.  Finally,  certain  connections  are  made  with  the  continuous-time 
case.  We  build  on  these  results  in  [16],  [36],  [40],  [41]  in  which  we  investigate  the 
effect  of  sampling  on  linear  equivalence  for  continuous  time  systems.  It  is  shown 
that  the  discretized  system  is  linearizable  by  state  coordinate  change  for  an  open 
set  of  sampling  times  if  and  only  if  the  continuous  time  system  is  linearizable  by 
state  coordinate  change.  Also,  for  n  =  2,  we  show  that  even  though  the  discretized 
system  is  linearizable  by  state  coordinate  change  and  feedback,  the  continuous  time 
affine  complete  analytic  system  is  linearizable  by  state  coordinate  change  only.  Also, 
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we  suggest  a  method  of  proof  when  n  >  3. 

The  papers  [7],  [32]  investigate  the  global  controllability  of  piecewise-linear  (hy¬ 
persurface)  systems,  which  are  defined  as  control  systems  that  are  subject  to  affine 
dynamics  on  each  of  the  components  of  a  finite  polyhedral  partition.  Various  new 
tools  are  developed  for  the  study  of  the  problem,  including  a  classification  of  the 
facets  of  the  polyhedra  in  the  partition.  Necessary  and  sufficient  conditions  for 
complete  controllability  are  obtained  via  the  study  of  a  suitably  defined  controlla¬ 
bility  connection  matrix  of  polyhedra.  In  [9]  and  [37],  we  investigate  the  problem 
of  smooth  feedback  stabilization  of  nonlinear  systems  with  stable  uncontrolled  dy¬ 
namics.  We  present  sufficient  conditions  for  the  existence  of  a  smooth  feedback 
stabilizing  control  that  are  also  necessary  in  the  case  of  linear  systems.  Analogous 
results  are  established  for  discrete  time  systems. 

1.3.  Deterministic  Nonlinear  Adaptive  Control. 

Almost  all  of  the  work  in  the  field  of  deterministic  adaptive  control  is  restricted 
to  the  study  of  linear  plants.  In  trying  to  extend  adaptive  schemes  to  nonlinear 
systems,  one  is  faced  with  considerable  obstacles.  The  most  important  of  these  is 
the  lack  of  a  systematic  methodology  for  nonlinear  feedback  design.  In  recent  years, 
considerable  effort  has  been  invested  in  the  study  of  canonical  forms  for  nonlinear 
systems  and  in  particular  the  characterization  of  the  class  of  those  systems  which 
are  linearizable  under  the  action  of  the  nonlinear  feedback  group.  Equivalence  to 
linear  dynamics  is  a  particularly  desirable  property  from  the  point  of  view  of  control 
synthesis. 

A  possible  design  methodology,  applicable  to  linearizable  systems,  is  to  build  a 
controller  for  the  nonlinear  system  by  designing  one  for  the  equivalent  linear  system 
and  utilizing  the  transformation  (from  linear  to  nonlinear)  along  with  its  inverse. 
This  approach  has  already  been  applied  to  the  design  of  automatic  flight-control 
systems  for  aircraft  of  significant  complexity.  The  chief  drawback  of  this  method 
is  that  it  relies  on  all  the  states  being  measured  and  on  an  exact  cancellation  of 
nonlinear  terms  in  order  to  get  linear  behavior.  Consequently,  in  the  case  where  the 
plant  contains  unknown  or  uncertain  parameters,  adaptation  is  desirable  in  order 
to  robustify,  i.e.,  make  asymptotically  exact,  the  cancellation  of  nonlinear  terms.  A 
major  difficulty  here  is  that  the  linearizing  transformation,  being  a  function  of  the 
system  parameters,  is  itself  unknown,  and  hence,  the  above  design  approach  does  not 
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allow  for  a  straightforward  incorporation  of  an  adaptive  controller.  The  extension  of 
parameter  adaptive  algorithms  developed  for  linear  systems  to  “linearizable”  ones 
becomes,  therefore,  an  important  problem. 

In  a  study  [12],  [33],  that  seems  to  be  among  the  first  of  its  kind,  we  restricted  our 
attention  to  “pure  feedback”  systems,  a  special  class  of  nonlinear  systems  which  arise 
as  a  canonical  form  of  linearizable  dynamics.  We  presented  an  adaptive  algorithm 
and  the  design  of  a  model  reference  adaptive  controller  for  this  class  of  problems. 
An  interesting  feature  of  our  adaptive  scheme  is  that  it  updates  estimates  of  the 
feedback  and  coordinate  transformation  required  to  linearize  the  system.  Under 
some  mild  technical  conditions,  we  established  global  convergence  of  the  output 
error  for  all  initial  estimates  of  the  parameter  vector  lying  in  an  open  neighborhood 
of  the  true  parameters  in  the  parameter  space.  Also,  in  simulation  studies,  the 
performance  of  the  algorithm  was  excellent.  At  first  sight,  this  model  might  seem 
as  a  fairly  restricted  class  of  nonlinear  plants.  One  should  keep  in  mind,  though,  that 
not  only  does  this  model  cover  a  wide  range  of  interesting  real  life  applications,  but, 
in  addition,  the  pure-feedback  form  may  be  viewed  as  a  canonical  form  of  feedback 
linearizable  nonlinear  systems. 

1.4.  Other  Related  Research. 

We  have  also  made  progress  in  a  number  of  other  related  areas  of  research.  In 
the  are  of  robotics,  the  problem  of  selecting  joint  space  trajectories  for  redundant 
manipulators  is  considered  in  [13].  Solutions  which  allow  secondary  tasks  to  be 
performed  by  the  arm  simultaneously  with  end-effector  motions  may  be  selected  in 
a  number  of  ways.  An  algorithm  to  accomplish  this  by  means  of  conditions  on  a 
scalar  function  of  the  joint  variables  is  introduced  and  analyzed  in  [13].  In  [14], 
the  problem  of  the  distribution  of  dynamic  loads  for  multiple  cooperating  manip¬ 
ulators,  is  considered.  Methods  of  load  distribution,  which  allow  desired  object 
motion  while  selecting  loads  desirable  for  alleviating  manipulator  dynamic  loads, 
are  developed.  The  motion  and  internal  loads  induced  on  an  object  manipulated  by 
two  or  more  robotic  mechanisms  are  considered  in  [28].  In  particular,  for  a  desired 
motion  trajectory  of  the  object,  the  question  of  load  distribution  among  the  arms 
is  analyzed,  with  particular  attention  given  to  the  internal  loading  of  the  object.  A 
new  representation  of  the  load  distribution  problem  is  given  by  the  introduction  of 
a  particular  “non-squeezing”  pseudoinverse,  which  is  shown  to  possess  properties 
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which  expose  the  underlying  structure  of  the  problem.  It  is  expected  that  by  using 
this  pseudoinverse,  new  insight  will  be  gained,  and  necessary  analysis  simplified,  in 
a  number  of  aspects  of  multiple  manipulator  research.  A  number  of  these  aspects 
are  detailed  and  illustrated  using  a  two  armed  example. 

In  the  area  of  discrete  event  dynamic  systems,  we  have  designed  algorithms  for 
supervisor  synthesis  problems  with  partial  observations  [15].  These  algorithms  pro¬ 
vide  a  good  suboptimal  solution  to  the  problem;  in  addition,  they  involve  new 
classes  of  automata  which  are  of  interest  in  their  own  right.  However,  these  solu¬ 
tions  are  often  too  restrictive,  and  in  [20]  we  have  studied  a  more  general  class  of 
solutions.  These  give  rise  to  another  interesting  class  of  supervisors,  but  they  are 
computationally  much  more  difficult.  In  [21],  we  discuss  the  computation  of  supre- 
mal  controllable  and  normal  sublanguages.  We  derive  formulas  for  both  supremal 
controllable  sublanguages  and  supremal  normal  sublanguages  when  the  languages 
involved  are  closed.  As  a  result,  those  languages  can  be  computed  without  applying 
recursive  algorithms. 

Periodic  orbits  of  the  matrix  Riccati  equation  are  studied  in  [10];  it  is  shown 
that  periodic  solutions  are  bounded  if  and  only  if  the  span  of  their  range  does  not 
intersect  the  orthogonal  complement  of  the  controllable  subspace  of  the  associated 
linear  system.  In  [11],  a  discrete-time,  linear,  time-invariant  control  system  with  a 
fixed  time  delay  in  the  feedback  loop  is  considered;  simple  necessary  and  sufficient 
conditions  for  feedback  stabilization  are  developed.  Based  on  a  minimax  criterion, 
we  define  in  [22]  the  concept  of  equalizability  for  a  nonlinear,  discrete-time  commu¬ 
nication  channel.  Sufficient  conditions  for  a  channel  to  be  equalizable,  via  a  finite 
memory  equalizer,  are  also  derived. 
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